Introduction to the Distribution program

This section introduces you to the Distribution program. Topics include:

Overview

Trip distribution is the process of estimating the number trips that will travel between all the zones in the system. Usually the process uses the number of trip ends in each zone as the starting point. These margin totals are distributed to the rows and column of a generated matrix. Usually, additional information about some measure of travel between each zone pair should be included in the process. The most commonly used distribution process is the "gravity" model, but other processes are also employed. CUBE Voyager does not have any automatic, or default, trip distribution process. The Matrix program provides a framework that allows the user to perform distribution in a variety of ways. In some cases, the Matrix program does have some built-in functions that aid in the implementation of the more popular distribution processes. A gravity model distribution can be easily implemented within Matrix, but it does need some help, and the user has to follow certain guidelines for proper implementation. A brief illustration of a typical gravity distribution follows. The gravity model equation is:

TRIPi-j = Pi* Aj * func(Ti-j) / Sz=1..n (A * func(T)).

where:

P = the number of trip productions for a zone.

A = the number of trip attractions for a zone.

T = the travel impedance factor between zones. i = the production zone.

j = the attraction zone.

z = any zone.

n = the number of zones.

In simple terms this states that the trip productions in zone I will be distributed to each zone J according to the relative attractiveness of zone J. Each J’s attractiveness is determined by the product of its attractions and some function of the spatial separation between I and J. The sum of the these products for all Js (relative to I) is obtained and often referred to as the accessibility index for I. Each J will then be given a prorata share of the productions for I based upon its attractiveness relative to the accessibility index for I.

The function of spatial separation (func(T)) is the indefinite portion of the equation. It is believed to be best described by an expression of the form of e^bT, where b is the curve factor, and T is the travel time. Experience has shown that often a single curve does not suffice, and it difficult to estimate what b should be used. To facilitate application, most gravity model processes use a lookup function to obtain empirical values for the function based upon the impedance. These curves are usually called friction factor curves. Numerous applications have shown that friction factor curves tend to follow the same patterns for similar conditions. Many agencies share the same curves when their regions are similar in nature.

To implement the gravity model in Matrix, several guidelines must be followed:

There must be a set of Ps and As for each purpose to be estimated. They are established by the presence of a SETPA control statement.
A mechanism must be established to obtain the travel function. Usually, a level-of-service matrix is used to obtain the zone-to- zone impedance for I-J, and the travel function value (friction factor) is obtained by finding the impedance in a LOOKUP table.
The gravity model equations must be executed in a specific order. At least three expressions are required. One to compute attractiveness for each J, one to sum them to form the accessibility index for I, and one to distribute the productions based upon the values and the accessibility index. Obviously, these expressions must be executed in the proper order.

As an alternative to complete user definition of the gravity formulation, the GRAVITY control statements can be used to perform a built in process for a doubly constrained GRAVITY model. This process is more efficient than complete formulation. A singly constrained gravity model can be formulated as a destination- choice model with the CHOICE control statement in Matrix. See Examples and FAQ under the Matrix program for an example of a singly constrained gravity model.

Example

/* given:
The impedances are in the first matrix of file: IMP.MAT
The productions and attractions are in file: PA.ZDA
The friction factors are in file: FF.DAT
*/
FILEI MATI[1] = IMP.MAT,
ZDATI[1] = PA.ZDA,Z=#1,P1=#2,A1=#3
FILEO MATO[1] = OUT.MAT, MO=1
LOOKUP NAME=FF,LOOKUP[1]=#2,RESULT=#1,
FILE=FF.DAT, INTERPOLATE=Y
SETPA P[1]=ZI.1.P1, A[1]=ZI.1.A1
MW[1] = A[1] * FF(1,MI.1.1), PAF=P[1]/ROWSUM(1), MW[1]=PAF * MW[1]
GRAVITY PURPOSE=1, LOS=MI.1.1, FFACTORS=FF ; faster process

Convergence: Iteration control

The gravity model equation ensures that the correct number of trips will be distributed for each I zone; the row (production zone) totals for each will always match the number Ps for the zone. There is no method to guarantee that the correct column totals (number of attractions) will be obtained for each J zone. In all likelihood, the estimated column values will not match the desired number of As input for each zone. This problem is alleviated by iterating the model. An iteration is a complete pass for all I zones. At the end of each iteration, the estimated column totals are compared to the input As, and, based upon the comparison, the process is repeated with an adjustment in the data. The adjustment is based upon the closeness for each zone. The iteration process is repeated until it is decided that the results are close enough, or that a maximum number of iterations have been performed. Following is small example of this process:

Prior to the first iteration the desired totals are:

Zone    1  2  3   Total
1      --  --  -- 100
2      --  --  -- 200
3      --  --  -- 300
Total 240 200 160 600

There really is no matrix yet. The totals are the values as obtained from the SETPA control statements. The model goal is to fill in the matrix with values so that the totals will match the totals shown. The row totals shown are the Desired Ps for each zone, and the column totals are the Desired As. A set of A values internally named Used are set equal to the Desired values. An iteration is performed and the Estimated results appear as:

After Iteration 1:

Zone    1   2   3 Total
1      57  24  19 100
2      64 106  30 200
3     102  61 137 300
Total 223 191 186 600
Use2  258 210 138

As dictated by the formulation, the row totals are correct. But, the Estimated column totals do not quite match the Desired. The As for each zone are adjusted by a factor that is computed as: Desired * Used / Estimated. Another iteration is performed, and the results appear as:

After Iteration 2:

Zone    1   2   3 Total
1      60  24  16 100
2      67 108  25 200
3     113  66 121 300
Total 240 198 162 600
Use3  258 212 136

The Estimated column totals are still not exactly what is desired, so adjustments are made to the used As, and another iteration is performed.

After Iteration 3:

Zone    1   2   3 Total
1      60  24  16 100
2      66 109  25 200
3     114  67 119 300
Total 240 200 160 600

Now the Estimated totals and the Desired totals match for all zones. The model is completed. Further iterations would be useless; nothing would change.

In the real world, with many zones, it is not likely that all the Estimated totals would match exactly with the Desired totals. As a matter of fact, in this simple three zone problem, the totals did not match exactly (there were some slight differences in the decimal places). But, the floating point values were close enough to be accepted as having matched. How does the program decide when it should stop? There are two parameters that can be used for controlling cutoff: MAXITERS and MAXRMSE.

MAXITERS specifies that no more than a specified number of iterations are to be performed. The default is MAXITERS=3; this is usually sufficient for most gravity model applications. Additionally, the program computes the root mean square error of the differences in Estimated vs. Desired attractions (sqrt (Sum(diff^2) / (n-1)). If the computed RMSE is less than MAXRMSE, a completion flag is set. The default is MAXRMSE=10, but that may not always be a good choice for certain applications. In the sample three zone problem the RMSEs for each iteration were computed as: 22.78, 2.08, and 0.26. The final value of 0.26 indicates that there were still some slight differences in the Estimated and Desired, but as noted above, they were insignificant. If the default MAXRMSE value had been used, the process would have cutoff after two iterations.

Multiple purposes

Usually a given application will consist of more than one purpose. Traditionally, at least three internal purposes are estimated, with the most commonly used purposes being: Home Based Work (HBW), Home Based Other (HBO), and Non Home Based (NHB). There are other popular purpose stratification, but these are the most commonly used. Often Internal-External and External-Internal and truck and taxi purposes are also estimated; some analysts prefer to estimate them in a separate application, and some prefer to do them all at once. With the CUBE Voyager process, all the purposes can be estimated in one application because the user has the ability to specify different gravity equations for each of the purposes. Each purpose can have its own impedance matrix and different formulation (functions, expressions, friction factors, etc.). The Ps and As and lookup functions can be obtained from different sources.

There is one caveat: If convergence has not been reached for any purpose, and MAXITERS has not been reached, another iteration will be processed with adjusted As for all purposes. That is no concern; the results will not get worse. The MAXRMSE parameter applies to the highest RMSE of all purposes. MATO files are written only on the last iteration. If the iteration number is equal to the PARAMETERS MAXITERS value, the program knows when the last iteration is being processed. But, convergence could be obtained at some lower iteration. In that case the program will perform another iteration, similar to the one just completed, but this time writing the MATO files. In other words, if convergence is reached before MAXITERS, an additional iteration will be performed in order to obtain the matrices. The use of PARAMETERS MATOEVERY will preclude an additional iteration, but at the cost of additional time to write matrices during each iteration.

The number of purposes is determined by the highest P[] or A[] index established via the SETPA control statement. The program assumes that there will be purposes from one, monotonically, through that highest index. Other COMP MW[]= statements may be specified, but they will not be involved in any internal purpose editing. If the estimates follow the same formulation, the set up is only slightly different from what is depicted, above. Example 1 illustrates a typical setup for a three purpose application.

Example 1: Three purpose - same P/A and FF file for all purposes

LOOKUP FILE=FF.DAT, INTERPOLATE=Y, NAME=FF,
LOOKUP[1]=1,RESULT=2,
LOOKUP[2]=1,RESULT=3,
LOOKUP[3]=1,RESULT=4
SETPA P[1]=ZI.1.P1, A[1]=ZI.1.A1,
P[2]=ZI.1.P2, A[2]=ZI.1.A2,
P[3]=ZI.1.P3, A[3]=ZI.1.A3
LOOP PURP=1,3
MW[PURP] = A[PURP] * FF(PURP,MI.1.1)
PAF=P[PURP]/ROWSUM(PURP)
MW[PURP]=PAF*MW[PURP]
ENDLOOP

In Example 2, purposes 1-3 are the same as in Example 1, and two internal-external purposes (with entirely different formulation) are to also be included. Purpose 4 uses an equation for distribution, and purpose 5 uses a simple lookup curve, with only three points in it.

Example

LOOKUP FILE=FF.DAT, INTERPOLATE=Y, FAIL=12000,1,0,
NAME=FF,
LOOKUP[1]=1,RESULT=2,
LOOKUP[2]=1,RESULT=3,
LOOKUP[3]=1,RESULT=4
LOOKUP NAME=XIFF, FAIL=500,100,
R= '500 1-20', ; read the data directly
'400 20-30',
'300 30-40',
'200 40-50'
SETPA P[1]=ZI.1.P1, A[1]=ZI.1.A1,
P[2]=ZI.1.P2, A[2]=ZI.1.A2,
P[3]=ZI.1.P3, A[3]=ZI.1.A3
SETPA P[4]=ZI.2.PI, A[4]=ZI.3.STACNT_OB,
P[5]=ZI.2.AI, A[5]=ZI.3.STACNT_IB
LOOP PURP=1,3
IF (PURP <= 3)
MW[PURP] = A[PURP] * FF(PURP,MI.1.1)
ELSEIF (PURP==4)
JLOOP INCLUDE=1-500
IMP = MI.2.IMPEDNACE
IF (MI.2.DISTANCE < 3) IMP = MI.2.SHORTIMP
MW[4] = A[4]*EXP(-B * IMP)
ENDJLOOP
ELSE
MW[5] = A[5] * XIFF(MI.2.IMPEDANCE) INCLUDE=501-535
ENDIF
PAF=P[PURP] / ROWSUM(PURP),
MW[PURP]=PAF*MW[PURP]
ENDLOOP

Of course, there are different ways that this distribution could have been written. Most likely, the purpose 4 and 5 Ps and As would not sum to the same total because they were derived from separate sources. If there is a difference in the totals for a purpose, the program issues a message and adjusts the As to match the P total. Differences in the totals would preclude convergence via RMSE calculations because there would always be differences.

Referencing productions and attractions

Productions and attractions are referenced by using array notation for P and A. P and A are doubly dimensioned arrays, where the first index references the purpose number, and the second index references the zone number. Since most users may not wish to always code a zone index (it is more intuitive to reference the arrays with just purpose number), the zone index need not be supplied; it will be automatically provided. However, the zone index has different meanings for P and A. For P, the default zone index is I and for A, it is J. If a different index is required, it may be explicitly provided. Most times an invalid index will cause a fatal termination. Purpose 0 and Zone 0 are valid, but may not always be predictable. Zone 0 should always contain the total for the purpose. The lower bounds for both indices are zero, and the upper bounds are "number of purposes" and zones, respectively.

P and A are protected variables; they may not be the result of COMP expressions, except on SETPA statements.

Travel function values: Friction factors

If friction factors are to be used in the distribution process, they are normally input to the program from an external lookup file. The lookup file is usually formatted as a series of curves – one curve for each trip purpose. The curves are organized vertically on the records of the file. A typical file would appear as:

1 9000 8000 7000
2 8000 7000 6000
3 7000 6300 5200
:
:
50  1  1  1
51  0  0  0

This file would be described with a LOOKUP statement such as:

LOOKUP FILE=FF_FILE, NAME=FF,
  LOOKUP[1]=1, RESULT=2,
  LOOKUP[2]=1. RESULT=3,
  LOOKUP[3]=1, RESULT=4

This statement indicates that the first four values from each record will be used. Field 1 will be the travel time value, and fields 2, 3, and 4 are the values for purpose 1, 2, and 3, respectively. The friction factor for purpose 1 for 1 minute would be 9000, and the friction factor for purpose 2 would be 8000. The friction factor for purpose 1 for a time value of 1.5 could be either 9000 or 8500, depending upon the options specified on the LOOKUP statement. If SETUPPER=T is specified, the friction factor will be 9000, and if SETUPPER=F and INTERPOLATE=T, the friction will be 8500. This capability allows for compatibility with other model systems.

In this example, a time value of 0.5 will result in the value 9000, because there is no explicit FAIL[1] specified. The friction for any value greater than 51 will be 0; no distribution. If the record for 51 were not in the file, the friction factor for any time greater than 50 would be 1. However, if FAIL[2]=0 were used, any time greater than 50 would be 0. It is recommended that a final value of 0 be used to preclude distribution to zones where the I-J time exceeds a certain value. This can also be controlled by use of the LOSRANGE keyword on the GRAVITY statement.